On Markov chain Monte Carlo methods for tall data

نویسندگان

Rémi Bardenet

Arnaud Doucet

Christopher C. Holmes

چکیده

Markov chain Monte Carlo methods are often deemed too computationally intensive to be of any practical use for big data applications, and in particular for inference on datasets containing a large number n of individual data points, also known as tall datasets. In scenarios where data are assumed independent, various approaches to scale up the MetropolisHastings algorithm in a Bayesian inference context have been recently proposed in machine learning and computational statistics. These approaches can be grouped into two categories: divide-and-conquer approaches and, subsampling-based algorithms. The aims of this article are as follows. First, we present a comprehensive review of the existing literature, commenting on the underlying assumptions and theoretical guarantees of each method. Second, by leveraging our understanding of these limitations, we propose an original subsampling-based approach relying on a control variate method which samples under regularity conditions from a distribution provably close to the posterior distribution of interest, yet can require less than O(n) data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios. Finally, we emphasize that we have only been able so far to propose subsampling-based methods which display good performance in scenarios where the Bernstein-von Mises approximation of the target posterior distribution is excellent. It remains an open challenge to develop such methods in scenarios where the Bernstein-von Mises approximation is poor.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Slice Samoling as a Markov Chain Monte Carlo Method

This article has no abstract.

متن کامل

Spatial count models on the number of unhealthy days in Tehran

Spatial count data is usually found in most sciences such as environmental science, meteorology, geology and medicine. Spatial generalized linear models based on poisson (poisson-lognormal spatial model) and binomial (binomial-logitnormal spatial model) distributions are often used to analyze discrete count data in which spatial correlation is observed. The likelihood function of these models i...

متن کامل

A mixed Bayesian/Frequentist approach in sample size determination problem for clinical trials

In this paper we introduce a stochastic optimization method based ona mixed Bayesian/frequentist approach to a sample size determinationproblem in a clinical trial. The data are assumed to come from a nor-mal distribution for which both the mean and the variance are unknown.In contrast to the usual Bayesian decision theoretic methodology, whichassumes a single decision maker, our method recogni...

متن کامل

Estimation for the Type-II Extreme Value Distribution Based on Progressive Type-II Censoring

In this paper, we discuss the statistical inference on the unknown parameters and reliability function of type-II extreme value (EVII) distribution when the observed data are progressively type-II censored. By applying EM algorithm, we obtain maximum likelihood estimates (MLEs). We also suggest approximate maximum likelihood estimators (AMLEs), which have explicit expressions. We provide Bayes ...

متن کامل

Joint Modeling of Dynamic and Cross-Sectional Heterogeneity: Introducing Hidden Markov Panel Models

Researchers working with panel data sets often face situations where changes in unobserved factors have produced changes in the cross-sectional heterogeneity across time periods. Unfortunately, conventional statistical methods for panel data are based on the assumption that the unobserved cross-sectional heterogeneity is time constant. In this paper, I introduce statistical methods to diagnose ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 18 شماره

صفحات -

تاریخ انتشار 2017

On Markov chain Monte Carlo methods for tall data

نویسندگان

چکیده

منابع مشابه

Slice Samoling as a Markov Chain Monte Carlo Method

Spatial count models on the number of unhealthy days in Tehran

A mixed Bayesian/Frequentist approach in sample size determination problem for clinical trials

Estimation for the Type-II Extreme Value Distribution Based on Progressive Type-II Censoring

Joint Modeling of Dynamic and Cross-Sectional Heterogeneity: Introducing Hidden Markov Panel Models

عنوان ژورنال:

اشتراک گذاری